The data presented is derived from reported crimes classified according to Maryland criminal code and documented by approved police incident reports. The data about crimes do not put info about the victins and its masks the actual address not putting the exact place where the complaint occured.
Source: https://data.world/jboutros/montgomery-county-crime
Maryland County Area
Importing libraries Pandas and Bokeh and configuring Bokeh to show chart inline (calling output_notebook() function)
In [406]:
import pandas as pd
import numpy as np
from bokeh.io import push_notebook, show, output_notebook
from bokeh.layouts import row
from bokeh.plotting import figure
from bokeh.models import (
GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, PanTool, WheelZoomTool, BoxSelectTool
)
output_notebook()
Configuring maps and loading data about where the complaints have occured. Observe, to sucesfully configure the Google Maps you have to create an API Key (You can generate one from this site: https://developers.google.com/maps/documentation/javascript/get-api-key) and change in the line 'plot.api_key = ""'
In [407]:
map_options = GMapOptions(lat=39.151042, lng=-77.193023, map_type="roadmap", zoom=11)
plot = GMapPlot(x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options)
plot.title.text = "Montgomery County"
# For GMaps to function, Google requires you obtain and enable an API key:
#
# https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
plot.api_key = "AIzaSyBFHmpkUOfk2FtDZXHVBSUUHp6LVPmI-fs"
Load data in using read_csv function, configure which tools will be available in the plot.
In [408]:
#Loading dataset from Montgomery County complaint dataset
monty_data = pd.read_csv("MontgomeryCountyCrime2013.csv")
latitude_data = monty_data["Latitude"]
longitude_data = monty_data["Longitude"]
monty_data.head()
Out[408]:
Categorizing complaint classes
In [409]:
#Creating a master class to categorize crimes
classaux = monty_data["Class"]/100
classaux = classaux.astype(int)
classaux = classaux*100
#Inserting this new data in the dataset
monty_data["MasterClass"] = classaux
#print(montydata.groupby("Class")["Class Description"].mean())
#Sort by Class of complaint to analise master classes of Class complaints
#montydata.sort_values(by="Class")
#montydata.sort_values(by="Class Description")
monty_data["Class","Class Description"]
#print(montydata.groupby["Class Description"])
In [410]:
source = ColumnDataSource(
data=dict(
lat=latitude_data[13:130],
lon=longitude_data[13:130],
)
)
print(source.data.values)
circle = Circle(x="lon", y="lat", size=15, fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())
Ploting the geographic data in Google Maps. Note that the 'show' function receives another parameter 'notebook_handle=True' responsible for tell Bhoke to do a inline plot
In [411]:
show(plot,notebook_handle=True)
Out[411]:
In [413]:
#Using the agg function allows you to calculate the frequency for each group using the standard library function len.
#Sorting the result by the aggregated column code_count values, in descending order, then head selecting the top n records, then reseting the frame; will produce the top n frequent records
top = montydata.groupby(['Class','Class Description'])['Class'].agg({"frequency": len}).sort_values("frequency", ascending=False).head(40).reset_index()
top['frequency'] = (top['frequency']/number_of_registries[0])*100
top
Out[413]:
In [414]:
from decimal import *
#Configure precision
getcontext().prec = 2
parcial_perc = top['frequency'].sum()
parcial_perc = round(parcial_perc,2)
print("The crimes above are responsible for up to " + str(parcial_perc) + "% of the total crimes")
In [416]:
#Considering the top crimes
#copy
top_classes_top = top
#Creation of a Master Class
top_classes_top['Master Class'] = 0
aux = top_classes_top['Master Class'].astype(float,copy=True)
top_classes_top['Master Class'] = aux
top_classes_top['Master Class'] = top_classes_top['Class']/100
top_classes_top['Master Class'] = top_classes_top['Master Class'].round()
top_classes_top['Master Class'] = top_classes_top['Master Class']*100
aux = top_classes_top['Master Class'].astype(int,copy=True)
top_classes_top['Master Class'] = aux
#teste.describe
#top_classes_top
#top_classes_top['Master Class'].describe()
#top_classes_top.dtypes
top_classes_top
Out[416]:
In [432]:
#Inserting the description of the Master Classes
top_classes_top['Master Class Description'] =''
top_classes_top[top_classes_top['Master Class'] == 600]
test_top = top_classes_top
test_top.loc[(test_top['Master Class'] == 600),'Master Class Description'] = 'LARCENY'
test_top.loc[(test_top['Master Class'] == 2900),'Master Class Description'] = 'MISC'
test_top.loc[(test_top['Master Class'] == 1400),'Master Class Description'] = 'VANDALISM'
test_top.loc[(test_top['Master Class'] == 1000),'Master Class Description'] = 'FORGERY/CNTRFT'
test_top.loc[(test_top['Master Class'] == 500),'Master Class Description'] = 'BURGLARY'
test_top.loc[(test_top['Master Class'] == 800),'Master Class Description'] = 'ASSAULT & BATTERY'
test_top.loc[(test_top['Master Class'] == 1800),'Master Class Description'] = 'CONTROLLED DANGEROUS SUBSTANCE POSSESSION'
test_top.loc[(test_top['Master Class'] == 700),'Master Class Description'] = 'THEFT'
test_top.loc[(test_top['Master Class'] == 2100),'Master Class Description'] = 'JUVENILE RUNAWAY'
test_top.loc[(test_top['Master Class'] == 2800),'Master Class Description'] = 'DRIVING UNDER THE INFLUENCE'
test_top.loc[(test_top['Master Class'] == 1900),'Master Class Description'] = 'CONTROLLED DANGEROUS SUBSTANCE IMPLMNT'
test_top.loc[(test_top['Master Class'] == 2200),'Master Class Description'] = 'LIQUOR - DRINK IN PUB OVER 21'
test_top.loc[(test_top['Master Class'] == 2400),'Master Class Description'] = 'DISORDERLY CONDUCT'
test_top.loc[(test_top['Master Class'] == 2700),'Master Class Description'] = 'TRESPASSING'
test_top
Out[432]:
According to wikipedia (https://en.wikipedia.org/wiki/Violent_crime) include but are not limited to this list of crimes: Typically, violent criminals includes aircraft hijackers, bank robbers, muggers, burglars, terrorists, carjackers, rapists, kidnappers, torturers, active shooters, murderers, gangsters, drug cartels, and others.
Only analysing each master class we can see that only tree master classes are considered violent, that are: 500 - BURGLARY, 800 - ASSAULT & BATTERY and 700 - THEFT.
In [439]:
test_top['Violent crime'] = False
test_top.loc[(test_top['Master Class'] == 500),'Violent crime'] = True
test_top.loc[(test_top['Master Class'] == 800),'Violent crime'] = True
test_top.loc[(test_top['Master Class'] == 700),'Violent crime'] = True
test_top.sort_values(['Violent crime', 'frequency'], ascending=False, axis=0, kind='quicksort')
Out[439]:
Acording to the data, almost 80% of the crimes selected from the total of crimes, the violent crimes are only
In [450]:
value_percentage = test_top[test_top['Violent crime'] == True]['frequency'].sum()
value_percentage = round(value_percentage,2)
print(str(value_percentage) + '% of the total crimes')
In [327]:
#Considering the top crimes
day_process = montydata
In [ ]:
#Considering the top crimes
In [ ]:
#Considering the top crimes
In [ ]:
#Considering the top crimes
In [ ]:
#Considering the top crimes